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This paper is concerned with deriving the limit distributions of stopping times devised to se¬ 
quentially uncover structural breaks in the parameters of an autoregressive moving average, 
ARMA, time series. The stopping rules are defined as the hrst time lag for which detectors, 
based on CUSUMs and Page’s CUSUMs for residuals, exceed the value of a prescribed thresh¬ 
old function. It is shown that the limit distributions crucially depend on a drift term induced 
by the underlying ARMA parameters. The precise form of the asymptotic is determined by an 
interplay between the location of the break point and the size of the change implied by the drift. 
The theoretical results are accompanied by a simulation study and applications to electroen¬ 
cephalography, EEG, and IBM data. The empirical results indicate a satisfactory behavior in 
finite samples. 
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1. Introduction 

Sequential change-point analysis is concerned with uncovering in an on-line fashion what 
is called structural breaks, deviations from a pre-specified in-control scenario. For the 
case of time series, relevant for this paper, the natural in-control scenario is the station- 
arity of the underlying stochastic process. More traditionally, sequential change-point 
techniques were developed for breaks in the mean and variance in sequences of indepen¬ 
dent observations. The corresponding literature is reviewed in the monographs Basseville 
and Nikiforov [6] and Csorgd and Horvath [12]. A more recent survey of both sequential 
and historical procedures is given in Aue and Horvath [3]. 

The particular approach to sequential change-point analysis of this paper is grounded 
in the work of Chu et al. [11], who developed procedures using a training sample to 
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estimate an initial model and to monitor for deviations from that model as soon as new 
observations arrive. This contribution, originally written with applications to econometric 
data in mind, has been extended in a number of ways. Further sequential procedures 
covering financial time series were discussed in Andreou and Ghysels [1], and Aue et al. 
[4]. Berkes et al. [8] introduced methodology applicable to GARCH processes. Gombay 
and Serban [18] worked with autoregressive processes, while Gombay and Horvath [17] 
considered weakly stationary time series. Refinements using bootstrap were considered 
in Kirch [23] and Huskova and Kirch [20], while resampling schemes were studied by 
Huskova et al. [21]. 

The basic time series model being utilized in this paper is the class of linear autore¬ 
gressive moving average, ARMA, processes made popular through the works of Box et 
al. [9] . ARMA processes find widespread applications in a number of fields as evidenced, 
for example, in the recent text Shumway and Stoffer [29]. As advocated by Brown et 
al. [10]) in a regression setting, the proposed monitoring procedures are based on the 
residuals obtained from an ARMA model fit to the original data based on a training 
sample of size m for which stationarity of the underlying process is assumed. If the pro¬ 
cess remains stationary after the monitoring starts, then residuals of the training period 
and the monitoring period should possess similar properties. The test procedures to be 
introduced here are based on traditional cumulative sum, GUSUM, statistics and a mod¬ 
ification, Page’s GUSUM statistics (see Page [26, 27]). The latter tend to react faster to 
deviations from the in-control scenario and satisfy certain optimality criteria (see Lorden 
[24]). GUSUMs for residuals of ARMA processes were discussed in a retrospective setting 
in Bai [5] , Yu [34] and Robbins et al. [28] , and in a sequential framework in Dienes and 
Aue [14]. Recent work on Page’s GUSUMs can be found in Fremdt [15, 16]. 

A stopping rule is then defined as a first crossing time, that is, the time lag for which 
either the GUSUM or Page’s GUSUM statistic exceed a threshold value tolerable for the 
in-control case. The focus of this paper is on deriving the asymptotic distributions of these 
stopping rules for the situation that deviations from stationarity of the underlying process 
occur. The particular deviations of interest are the classic change in mean and general 
changes in the second-order dynamics, with an emphasis on changes in the variance (or 
scale) due to the nature of the data examples provided in this paper. Namely, the finite- 
sample properties of the proposed methods are discussed in two case studies. The first 
of the applications involves EEG data. Here interest is in detecting the occurrence of an 
epileptic seizure (see Davis et al. [13]). The second application deals with closing prices 
of IBM stock, a classic data set that has been analyzed with historical procedures for the 
presence of breaks in variance (see, e.g., Tsay [31]). Accompanying simulation evidence 
indicates that the procedure works satisfactory for these two examples. 

The paper is organized as follows. Section 2 details the ARMA model and states the 
hypotheses to be tested. Section 3 quantihes the large-sample behavior of the delay times 
incurred by the GUSUM and Page’s GUSUM procedure. Applications to EEG and IBM 
data are discussed in Section 4. All proofs are given in Section 5. 
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Let Z denote the set of integers. In what follows, (Yt:t G Z) denotes the ARMA(j>, q) 
process specified by the stochastic recurrence equations 


cj)t{B){Yt-git) = et{B)eu teZ, ( 2 . 1 ) 


where fit are mean parameters, 0t(z) = l — - 4>t,pz'^ and Otiz) = \-\-6t^izA -h 

0t,qZ‘^ denote respectively the autoregressive and moving average polynomials, and B the 
backshift operator. The innovations {et'.t G Z) are assumed to be independent random 
variables with zero mean and variance ctj . As usual, it is further required that (f>t and 
9t have no common zeroes and that the ARMA process is causal and invertible, which 
means 


4>t{z)^0 and 9t{z)^0 for all |z| < 1. (2.2) 

The parametric model in (2.1) depends on the parameter vectors ^t = (/^ti 0t! o■t)^ 

where (pt = iPt,!, ■ • ■, Pt.p)' and 6t = (6*t,i,. ■., 9t,qy, with ' denoting transposition. These 
vectors may be time dependent and interest is in monitoring the constancy of the in a 
sequential fashion. This is important because constancy of the would imply stationarity 
of the underlying ARMA process, so that standard methods are available for estimation 
and prediction purposes. To set up the monitoring, a training period of size m + p is 
utilized for which 

Yi_p, are governed by = (po, (po,Oo, (Jo)'. (2.3) 

As Chu et al. [11] elaborate, this training period may be used to estimate the parameters 

of an initial non-contaminated model and to express limit results in the form m —^ 

oo. In particular, let (Xt :t gZ) be the centered sequence defined by Xt = Yt — fit and 
define = {flm, 4>rm to be a -yro-consistent estimator for obtained from the 

training period data. This gives the model residuals 

p 9 

^ ^ ^ ^ 9jn,j^t—j j 

1=1 1=1 

with Xt = yt — pm and initializations i-q+i = • • • = £o = 0 in case q > 0. 

In the following, two sets of hypotheses will be considered. First, the focus will be on 
the arguably most studied case for which only mean breaks are permitted. The sequential 
testing problem then becomes 

Ho : Ym+i,Ym+ 2 , ■ ■ ■ have mean po; 

H'^ : Ym+i ,..., Ym+k*-i have mean po, but 
Ym+k -, Tm+fe*+i,... have mean fiA^^fio, 
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where here the constancy of the remaining model ARMA parameters is required, so that 
changes may only affect the mean. It may sometimes be of greater importance to test 
for changes in the underlying second-order dynamics. This can be done via testing the 
general sequential hypotheses 

i^o : Ym+i,Yjn+ 2 , ■ • ■ are governed by Ig; 

: Ym+i,-. ■, Ym+k--i are governed by Ig, but 

Ym+k *, Ym+k*+i,- ■ • are governed by 7 ^ ^g. 

Under the decomposition + ^m be utilized, where <5^ = (<5^, <5^, <5^,(5^)' 

denotes the difference in parameter values. 

For both sets of hypotheses, one can now proceed as follows. If the respective null sce¬ 
narios hold, then the residuals it should roughly resemble the corresponding innovations 
St and suitably constructed statistics should therefore behave similarly on the training 
period and after monitoring commences. Under the alternatives, this should not be the 
case. This approach will be detailed in the next section. 

3. Monitoring schemes and their large-sample 
properties 

3.1. CUSUM and Page’s CUSUM procedures under the null 

Testing procedures for the set of hypotheses introduced in the previous section are com¬ 
monly defined as stopping times that reject the null if a detector crosses the boundary 
prescribed by a threshold function. Popular choices for the detector are based on cumu¬ 
lative sum, CUSUM, statistics and on its variant, called Page’s CUSUM. Let N denote 
the positive integers. To introduce the CUSUM of (squared) residual procedures, define 
for /c G N the detectors 

m-\-k , m m-\-k , m 

D^{m,k)= D^{m,k)= Y ^t-—Y^t- (3-1) 

t—1 t—1 

The detector is built from the residuals it and used to test Hq against 

while the detector is built from the squared residuals it and used to test Hq against 
Hj^. Using the class of weight functions 

S,(m,t) = v^(l + i)(^) . (3.2) 

indexed by a sensitivity parameter 7 G [ 0 , 1 / 2 ), a stopping time corresponding to the 
detector D^{m,k) can be defined by 


T^(m) = min{fc G N: |£>^(m,/c)| > Ca(5'm,g^(m,/c)}. 


(3.3) 
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where = Ca^"/) is a critical constant, derived from the limit distribution of the detector 
under Hq (see Theorem 3.1 below), ensuring that P(r^(m) < oo) = a for a given level 
aG(0,l). 

The stopping time T^{rn) for the detector D^{m,k) is defined analogously: Let 
denote a weakly consistent estimator of the quantity = E[(e 2 — Then T^{m) is 

given by replacing D^[m,k) and Um with D^{m,k) and ijm, respectively. 

Page’s CUSUM procedure is a modification of the CUSUM detectors in (3.1) based on 
the adjusted detectors 


b^{m,k)= max \b^{jn^k) — b^{rn^k')\, 
0 k 


b^{m,k)= max \b^{m,k) — I)^{m,k')\, 
0 A) ^ k 


(3.4) 


setting bf^{m,0) = b^{m,0) = 0. Utilizing the same class of weight functions in (3.2) as 
before gives rise to the Page-type stopping time 

uT(m) = min{fc G N: (m, k) > c^d-mg-y{m, fc)}, (3-5) 

where ( 7 ) controls again the level of the sequential procedure. The stopping time 

ijn) is defined in a similar fashion. These sequential detectors were introduced in the 
seminal papers (Page [26, 27]). 

All procedures are based on residuals instead of directly on the observations. This 
has the advantage that the notoriously difficult estimation of long-run variances of the 
dependent observations can be completely avoided. Better size and power properties are 
expected from this approach as pointed out in Robbins et al. [28], who confirmed these 
statements in an extensive simulation study. 

The large-sample behavior under the null hypotheses for the four detectors is quantified 
in the following two theorems, the first one of which states the results for the mean only 
procedures. 


Theorem 3.1. Let {Ytit G Z) follow the ARMA equations (2.1) and assume that 
E[|eir] < fof some v >2. Then it holds under Hq and for all real c that 


1 


\bf,{m,k)\ 


(a) lim P ( sup -—< c ) = P 

fc>l 

. , ,1 bf({m,k) 

(b) lim P -— sup —< c ) = P 


\W{x)\ 

sup -^ < c 

0 <a:<l X'^ 


1 

sup sup — 

0<a:<10<y<a: 


^ fc>l gryim.k) 
where {W{x):xG [0,1]) denotes a standard Brownian motion. 


W{x) - ^-^W{y) 


1 - 2 / 


< c 


Theorem 3.2. Let (YtifGZ) follow the ARMA equations (2.1) and assume that 
E[kir] < for some v > A. Then, under Hq and for all real c, the limit results of 
Theorem 3.1 are retained if Dfj_{m,k), Dj({m,k) and Am o,xe replaced with the respective 
objects D^{m,k), D^{m,k) and fjm- 








6 


Aue, Dienes, Fremdt and Steinebach 


The proofs of the theorems follow from the results in Dienes and Aue [14] for the 
CUSUM procedure, and from a combination of the latter with the proofs in Fremdt [16] 
for Page’s CUSUM procedure. Tables containing simulated critical values for a selection 
of sensitivity parameters 7 and test levels a can be found in Horvath et al. [19] for the 
limit in Theorem 3.1, part (a) and in Fremdt [16] for the limit in part (b). 


3.2. Limiting delay times for mean breaks 


The quality of monitoring procedures is often quantified via the mean delay time which 
measures how long, on average, one has to wait before the structural break in the under¬ 
lying processes is detected. For example, certain optimality criteria for Page’s CUSUM 
were developed in Lorden [24]. The monograph by Basseville and Nikiforov [ 6 ] gives an 
account of the subsequent contributions in this area. The main theoretical contribution 
of this paper is the derivation of the complete limit distribution of the stopping times 
under consideration. Taking the mean of this distribution, one obtains in particular also 
the information on the average delay time. Related results in the literature are Aue and 
Horvath [2], Aue et al. [4] and Fremdt [16]. To account for the ARMA time series char¬ 
acter, modifications of the methodology in these papers become necessary. These will be 
developed in the following. 

It is subsequently assumed that holds and that thus changes in the second-order 
structure of the ARMA process do not occur. Notice that assumption (2.2) implies that 
the reciprocals of (('t(z) and Ot{z) admit, for \z\ < 1 , the power series expansions 


tXJ -| tXJ 

= and (3.6) 

tv ) 

Denoting the training period estimates of the autoregressive and moving average poly¬ 
nomials by and 9ra{z), for large enough m, one hnds analogously power series 

expansions for their reciprocals. These will be written as 


— =y2'^d^m)z^ and 
e=o 


dmiz) 


= ^i>e{9m)z^. 


(3.7) 


e=o 


Under H^, the asymptotic behavior of the delay time will depend on the size of the mean 
change 6^ = ^a — 1*0 which in turn induces the drift term 


A^^ = sdi-j2^oA^Meo) = s!;„ 


1=1 


e=o 


<(>o(l) 

0 o(l)' 


(3.8) 


Note that the difference of pre-mean and post-mean is allowed to depend on m, so that 
one could more explicitly write The precise limit distribution will crucially depend 

on the interplay between the behavior of the drift term A!^ and the location of the mean 
change k*. This leads to the following set of assumptions which, in view of the theorems 
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to come, are formulated for a general sequence Am and not directly for Superscripts, 
such as g, here, will indicate which drift term is being used. 

Assumption 3.1. It is required that 

(a) there is 9 > Q such that k* = [9m^\ with /3 £ [0,1), where [-J denotes integer part; 

(b) v^l^m|-too; 

(c) |A^| = 0(1). 

Part (a) of Assumption 3.1 specifies the order of the change-point k* as a power of to. 
It is a standard assumption in the change-point literature. However, it should be noted 
that the expression k* = [0m^\ is not unique for fixed to and k* , and different specifica¬ 
tions of 9 and (3 may lead to different limit distributions. A discussion of this matter can 
be found in Section 3 of Fremdt [16]. Note also that parts (b) and (c) implicitly allow 
for the decay of the sequence |Am|. The proofs show that the form of the limit distri¬ 
bution of the stopping times depends then on the asymptotic behavior of the sequence 
\Am\'mA~^A}^*'^-'i of scaled drift terms. Due to part (a) of Assumption 3.1 which allows 
for the re-expression of k* in terms of to, they depend consequently on the asymptotic 
behavior of the scaled terms 


A^ = |A„|to^(i-^)-i/2+7^ 

which do not explicitly contain k* anymore. We distinguish between the three cases 

(i) Am 0, (ii) Am^CiG (0, oo), (hi) A^ oo. 

In case (ii), it follows from part (a) of Assumption 3.1 that \Am\'m'^~^^^k*^~'^ —>■ 9^~'^Ci = 
Cl £ (0,oo). For this scenario and any real c define di = di{c) to be the unique solution 
of 

di = l--^dl-A (3.9) 

In order to exhibit the asymptotic distribution of the stopping times, introduce first 
the case-dependent distribution function d/ by setting, for all real arguments u, 

in case (i), 

|, in case (ii), 

Jo, u<0, . 

= \2<^{u)-C u>0, in case (ill), 

where $ denotes the standard normal distribution function. The next theorem gives the 
large-sample behavior of T^[m) and (to). 

Theorem 3.3. Let {Yt £ Z) follow the ARMA equations (2.1) so that (2.2) and (2.3) 
hold, and suppose that Assumption 3.1 is satisfied for Am = Alf,. Then it holds under 
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H'i for all real u that 


(a) lim P 


<(m)-a^(cg) , 


Additionally, 


(b) lim P 


r^(m) - OmiCa) 


< M ) = $("«), 


bm{Ca) 

where am(c) is the unique positive solution of 


am(c) = 


|A(;i (a„(c))7 


(3.10) 


and 


(c) 


<yVam{c) ( J k* \ 


-1 


The proof of Theorem 3.3 is in Section 5.2. Note that the uniqueness of am(c) follows 
from a rewriting of equation (3.10) to 


C'm(c) — 


cm 


1 / 2-7 


|A(; 


-{am{c)y + k*. 


Now it can be seen that am{c) solves an equation of the form x = ax^ + b for appropriately 
chosen a > 0, b > 0 and 7 € [0,1/2). Since 0 ^( 0 ) > 0, it is unique as the intersection of 
the identity with a transformed power function whose exponent is smaller than one. 

A similar result can be obtained for the squared-residual procedures T^(rn) and (m) 
after appropriate modification. The proof of the following theorem may also be found in 
Section 5.2 below. 


Theorem 3.4. Let {Yt:t£ Z) follow the ARMA equations (2.1) so that (2.2) and (2.3) 
hold, and suppose that Assumption 3.1 is satisfied for A^ = (A((j)^. Then, under 
for all real u, the limit results of Theorem 3.3 are retained if T^{m), T^(m) and a are 
replaced with the respective objects T^{m), (m) and rj. 

Some discussion is in order. First, the limit distributions for Page’s CUSUM and the 
traditional CUSUM coincide for the early change scenario (i). Therefore, all procedures 
work similar in a large-sample setting. The critical values for the traditional CUSUM 
are somewhat smaller than those for Page’s CUSUM (comparing the tables in Horvath 
et al. [19] with those of Fremdt [16]), giving it a slight edge for this case. However, 
limiting distributions are different for the intermediate and late change scenarios (ii) and 
(iii), respectively. Here, Page’s CUSUM outperforms the traditional CUSUM. This can be 










Monitoring ARMA time series 


9 


explained by the fact that, unlike Page’s CUSUM, the traditional CUSUM is not resetting 
and so becomes less sensitive to a change the later it occurs after the onset of monitoring. 

Second, in view of the last paragraph. Page’s CUSUM is generally preferred for applica¬ 
tions unless the changes happen early. For the early change scenario (i) both procedures 
perform alike in finite samples (based on simulations not reported in the paper), but 
as the theoretical results indicate, the performance of the traditional CUSUM decays 
noticeably for (ii) and (iii). In fact, this stopping rule often exhibits significant non-zero 
probabilities of non-detection in intermediate and late changes scenarios if the monitoring 
period is not sufficiently long. 

Third, the sensitivity of the test can be adjusted by the statistician through the choice 
of 7 . For example, it has been pointed out by Aue and Horvath [2] that the term am.{c) 
can be interpreted as the average delay time E[t], where r stands for any of the stop¬ 
ping times under consideration. For the early change scenario (i), it follows then that 
E[t] Ri This quantity becomes small if 7 is chosen close 

to 1/2, thus ensuring a quicker detection. However, there is an obvious trade-off between 
detection time and false alarm rates, with the latter increasing with increasing 7 . Similar 
computations can be obtained for cases (ii) and (iii) as well. 

3.3. Limiting delay times for scale breaks 

In view of the applications, for which only changes in the scale are considered, presenta¬ 
tion in this section is focused on the case of a break in the scale parameter a only. All 
other parameters are assumed to remain the same before and after the change occurs. 
The section closes with remarks for the general case, but a more in-depth analysis is be¬ 
yond the scope of the present paper. The special case of the general alternative H^, for 
which only the scale parameter is subject to change, will be called EE^ in the following. 
A change of scale will induce the drift term 

A- =(C)^ + 2aoC 

into the squared-residual procedures. If this drift term satisfies the regularity conditions 
imposed through Assumption 3.1, then the asymptotic delay time distribution can be 
quantified accordingly. 

Theorem 3.5. Let {Yt:t£ Z) follow the ARMA equations (2.1) so that (2.2) and (2.3) 
hold, and suppose that Assumption 3.1 is satisfied for Am = A^ 7 ^ 0 and 6m = 0(1). 
Then the results of Theorem 3.) remain valid under 

The proof of Theorem 3.5 is given in Section 5.3. The general case is much more 
difficult to handle. The induced drift term will be a complicated function of the pre-break 
parameters and post-break parameters In principle, the arguments developed in 
order to verify the theorems of Sections 3.2 and 3.3 could be adjusted to this case. 
However, one has to keep track of additional terms, the number of which may be growing 
exponentially in the number of parameters. Given the complexity of the proofs, we refrain 
from pursuing this direction further for this paper. 
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4. Applications 

In order to demonstrate the proposed methodology in the finite sample setting, two case 
studies are provided in this section. The first involves an EEG data set considered in 
Davis et al. [13], the second is a classic data set on IBM stock given in Box et al. [9], 
previously analyzed for breaks in variance with retrospective methods. 


4.1. EEG data 

In this section, the proposed methodology is applied to two snapshots of a longer series of 
32 768 EEG measurements observed from a female patient diagnosed with left temporal 
lobe epilepsy.^ This is the “T3 channel” data of Davis et al. [13]. Measurements were taken 
at a sampling rate of 100 Hz (i.e., 100 observations per second), so that the recording 
took place over a time period of 5 minutes and 28 seconds. As explained in Davis et 
al. [13], expert analysis suggests the onset of an epileptic seizure at observation 18 500. 
Their (retrospective) segmentation procedure estimates the seizure onset at observation 
18 580. A similar analysis is reported in Ombao et al. [25]. Particular interest here is 
in two segments of the original data focusing on the interval 16 000-19000 before and 
immediately after the suspected seizure onset. These observations are plotted in Figure 1. 
A visual inspection of the time series plot indicates that the level of the observations 
remains roughly the same. There is, however, an apparent increase in the amplitude 
around time 18 500, perhaps indicating a scale break. 

To test for this possibility, the following two scenarios for the training period, both of 
size m = 1000, were considered: 

(TPl) Observations 16 001-17000, 

(TP2) Observations 17001-18000. 

The training periods predate the epileptic seizure, with (TPl) implying a longer mon¬ 
itoring period before the break occurrence than (TP2). The choices of training periods 



16000 16500 17000 17500 18000 18500 19000 


Figure 1 . EEG data set. 


^ We thank Dr. Beth Malow (formerly Department of Neurology, University of Michigan) for providing 
the data. 
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Gase 

^m,l 

(pm,2 

(pm,3 

(pm,A 

l^m 

;;-2 

^m 

(TPl) 

1.66 (0.03) 

-0.79 (0.06) 

- 0.12 ( 0 . 06 ) 

0.20 (0.03) 

-207.2 (3.85) 

63.1 

(TP2) 

1.64 (0.03) 

-0.74 (0.06) 

-0.13 (0.06) 

0.18 (0.03) 

-206.6 (4.90) 

61.9 

(PC) 

1.46 (0.15) 

-0.61 (0.27) 

0.20 (0.27) 

-0.18 (0.16) 

-194.8 (12.53) 

227.9 


enable to examine the effect of the change-point location on the monitoring procedures. 
In each case, model selection procedures suggest nearly identical AR(4) models. Table 1 
contains estimated parameter values for both training cases as well as the monitoring 
observations immediately following the change-point suggested by the experts. To be 
precise, the post-change period is 

(PC) Observations 18 501-18580. 

All models were fit conditionally on four additional observations in the respective win¬ 
dows. (E.g., in the case of (TPl), the m + p= 1004 observations 15997-17000 were used 
for the estimation.) The tabulated estimates suggest the primary change occurs in the 
innovation variance, while the dynamics of the series remains largely intact. 

The proposed testing procedures were applied to the two training sets at the a = 0.05 
level. Critical values for the CUSUM procedure were obtained from Horvath et al. [19] 
and critical values for Page’s CUSUM procedure from Fremdt [16]. No changes were found 
by the mean-only procedures T^(rn) and (m) given in (3.3) and (3.5) when truncating 
the tests at monitoring time point 10m. The results for the general procedures 'r^(m) and 

(m) are summarized for three choices of 7 in the column labeled “Stopping Times” of 
Table 2. For (TPl), both procedures terminate within two seconds after the suspected 
onset of the change, for (TP2) within one second. Stopping times for (TPl) generally lag 
behind stopping times for (TP2). Page’s CUSUM detector displays faster detection for 
both training periods. 


Table 2. Summary of EEG stopping times and empirical values based on simulations from the 
estimated model with 2500 iterations 


Simulated empirical values 


Stopping times 95% upper limits Medians ERR 


Case 

7 

Page 

CUSUM 

Page 

CUSUM 

Page 

CUSUM 

Page 

CUSUM 


0 

18 637 

18 676 

18 728 

18 808 

18 623 

18 643 

0.0240 

0.0200 

(TPl) 

0.25 

18 609 

18 661 

18 718 

18 798 

18 614 

18 634 

0.0580 

0.0484 


0.49 

18 673 

18 691 

18 768 

18 830 

18 641 

18 650 

0.1344 

0.1296 


0 

18 580 

18 581 

18 648 

18 674 

18 576 

18 588 

0.0036 

0.0028 

(TP2) 

0.25 

18 580 

18 580 

18 626 

18 657 

18 561 

18 573 

0.0288 

0.0228 


0.49 

18 580 

18 580 

18 633 

18 660 

18 563 

18 569 

0.1140 

0.1104 
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It should be noted that a sequential procedure does not provide an estimator for the 
time of change. In general, it is a difficult problem to estimate the change-point after a 
sequential procedure has terminated because the post-change sample is typically (much) 
smaller than the pre-change sample. In the literature, Srivastava and Wu [30] and Wu 
[33] have discussed options for this problem. It would be worthwhile to follow up on their 
work elsewhere in the future. 

Motivated by the EEG data, several simulations were conducted to further elabo¬ 
rate on the distribution of the stopping times when a change occurs only in the in¬ 
novation variance. The simulations utilized an AR(4) model with fj, = —207 and (f> = 
(1.65, —0.75, —0.12,0.18)'. The innovations were distributed Laplace(0, bg = 5.6) since this 
closely described the behavior of the residuals from the EEG training models. Mimicking 
the two cases from the EEG application, we used training sizes of m = 1000 and induced 
changes in the variance by adjusting the scale parameter to 6 a = 10.7 at time point 
18 500 (i.e., monitoring time points 500 and 1500 for (TPl) and (TP2), respectively). 
The choice of scale parameters imply the difference d"’ = 7.21. Table 2 provides simu¬ 
lated empirical confidence limits, empirical median rejection times and false rejection 
rates (ERR). The reported values have been adjusted to fit the time locations observed 
in the EEG example. The reported stopping times for the EEG example all fall within 
the empirical upper bounds from the simulation study. The large false rejection rates for 
7 = 0.49 display the delay in convergence to the asymptotic levels suggested by Horvath 
et al. [19] and Fremdt [16] when the sensitivity parameter is close to the upper bound¬ 
ary. 

4.2. IBM data 

The second application is a study of a classic retrospective data set which has been 
previously studied for changes in the variance, albeit in a retrospective setting. The 
observations are on the IBM common stock daily closing prices from May 17, 1961 to 
November 2, 1962. This is Series B as reported in Box et al. [9]. The data set contains 
369 observations and has been examined in several retrospective studies which focused 
primarily on changes in the variance. Several authors have detected two change-points. 
Inclan and Tiao [22] detected change-points at observations 235 and 279 using their IGSS 
algorithm, Baufays and Rasson [7] proposed 235 and 280, Wichern et al. [32] gave 180 
and 235, while Tsay [31] reported only one change at observation 237. As previously 
suggested in order to stabilize the variance, the first difference of the log transformed 
series will be analyzed. Figure 2 displays the corresponding time series plot. It can be 
seen that fluctuations appear to be around a constant level, while amplitudes are larger 
for roughly the last third of the observations. 

To estimate an initial model, the training period is selected to consist of the first 
m = 200 observations. Two competing models were identified based on AIC and model 
selection diagnostic plots. The competing fits are the ARMA(2,2) and AR(4) esti¬ 
mated models summarized in Table 3, with the AIG value being slightly smaller for 
the ARMA(2,2) model. 
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Figure 2. Plot of transformed IBM data set. 


The proposed procedures were applied at the a = 0.05 level, utilizing both model fits. 
Monitoring commences at observation 201. The mean-only procedures T^{m) and (m) 
do not detect deviations from a constant level. Table 4 provides the observed values 
for the general stopping rules T^{m) and T^{m). Depending on the choice of 7 , both 
procedures report a change has occurred at or near observation 238. For comparison 
purposes, a simulation study was conducted and is also summarized in Table 4. The 
simulations generated training data from the observed ARMA(2,2) model. A change was 
induced at time point 235 to reflect the observed instability in the IBM example. Based 
on observations 235-279 (retrospective studies suggest stability over this period), the 
best fitting model was a white noise process with innovation variance given by 0.00135. 
The empirical measures from the simulation study are similar when assuming the correct 
ARMA model orders, as well as when the AR(4) is assumed. This highlights an important 
feature. Models with nearly identical MA(oo) representations exhibit similar behavior 
with respect to our proposed methodology. For our observed ARMA(2,2) and AR(4) 
models. Figure 3 displays the differences in the initial MA(oo) coefficients. 


5. Proofs 

5.1. Preliminaries 

The following auxiliary result will be used frequently. It establishes the behavior of the 
coefficients 7 rj(v) and V'ilu) in (3.6) if instead of the true parameter vectors (p and 6, 


Table 3. Summary of IBM modeling. Standard errors in parenthesis 


Model AIC Pm .,1 0m ,2 ^7n,3 0m,4 ^m,l ^m ,2 


ARMA(2,2) -1296 -0.40 (0.13) -0.68 (0.11) - - 0.67 (0.12) 0.76 (0.10) 8.5e-05 

AR(4) -1292 0.26 (0.07) -0.12 (0.07) -0.10 (0.07) 0.16 (0.07) - - 8.7e-05 
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Table 4. Summary of IBM stopping times and empirical values based on simulations from the 
estimated model with 2500 iterations 


Case 

7 

Stopping times 

Simulated empirical values 




95% 

upper limits 

Medians 

FRR 


Page 

CUSUM 

Page 

CUSUM 

Page 

CUSUM 

Page 

CUSUM 


0 

238 

239 

244 

244 

238 

238 

0.0024 

0.0024 

ARMA(2,2) 

0.25 

238 

238 

242 

242 

237 

237 

0.0228 

0.0216 


0.49 

238 

238 

241 

242 

236 

237 

0.1008 

0.1072 


0 

239 

242 

244 

244 

238 

239 

0.0004 

0.0004 

AR(4) 

0.25 

238 

238 

242 

243 

237 

237 

0.0120 

0.0100 


0.49 

238 

238 

241 

242 

237 

237 

0.0848 

0.0904 


generic elements v G and u G R'J in their vicinity are used for the respective power 
series expansions. Let | • | denote the maximum norm of vectors. 

Proposition 5.1. Let follow the ARMA equations (2.1) so that (2.2) holds. 

Let V G R^ and u, Ui, U 2 G R'^. Then there are £ > 0, c G (0,1) and K > 0 such that, for 
all j > 0, 

(a) \Trj(v)\<Kc^, if\v-4>\<e; 

(b) \iIjj{u)\<Kc^, if\u-e\<e; 

(c) |'i/'t(ui) - '0j(u2)| < K\ui - U 2 |jc-^“\ if |ui -9\<e and |u 2 -9\<e. 

Proof. The proof of these statements can be found in Bai [5]. □ 



Figure 3. Comparing MA(oo) coefficients from the observed ARMA(2,2) (filled) and AR(4) 
(opened) models. 
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Throughout the proofs, let Xt denote the difference between the residuals it and the 
innovations et if the null hypothesis Hq is valid. Since none of the parameters is subject 
to change, it holds then that 

At = Xt{y/m[9m - 0o], - Mo]), (5.1) 


where 


At(u,v,w) = Ct(u) + 


/3t(u,v) 


+ 


Pt(u,v,w) 

\/m 


To define the quantities on the right-hand side of the latter equality, let first \i* = 6 + 
u/and v* =4> + xj and set Ug = 0. Then 


Ct(u) 


/3t(u,v) 


pt{u,v,w) 


j=i \i=i 

p t-l q t-1 




t-l 

£=0 


This is the decomposition given in Yu [34] , which is useful to derive the limit distributions 
of the various test procedures under the null hypotheses as given in Theorems 3.1 and 
3.2. This was done for the CUSUM-type procedure in Dienes and Aue [14], but the same 
approach works also for the procedure based on Page’s CUSUM using the work of Fremdt 

[15, 16]. 

To prove the new results on the asymptotic delay time distribution of the stopping 
times one may modify methodology developed in Aue and Horvath [2] : It is subsequently 
shown that sequences N = N{m,x) can be found such that, for the stopping time r with 
corresponding detector D{m,k), it holds that 


P{t>N)=P 


D(m, k) 

max — - - — 

i<k<N k) 



converges to the appropriate limit distribution. The standardizations for r in the var¬ 
ious theorems are then implied by the definition of N. The next section contains the 
verification for the mean break case. 


5.2. Proofs of the results in Section 3.2 

For the mean break case, changes in the second order parameters (p, 9 and are 
precluded. To determine the effect of the mean break on the differences it — et, one 
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consequently needs to check only the terms including the It can be seen from (5.1) 
that these terms only enter through pt- To determine the drift induced by the change in 
mean under H^, a similar decomposition to (5.1) is needed. Following equation (14) in 
Yu [34], it follows that 


£t — £t — Ct(^m) + 


y/m 


t-i 

e=o 


{fim 


P 

i=i 


(5.2) 


where Ct{dm) = CtiV^Wm - ^o]) and i3t{0m,(f>m) =/3t(\/w[^m - eo],A/m[^m - ^o]) are 
respectively the terms of initialization effects and the partial sums of centered observa¬ 
tions and innovations. To derive (5.2), one uses the recursiveness of the difference it — £t 
and the invertibility of the underlying ARMA process. Now, as under a change occurs 
only in pt for t > m + fc*, it suffices to investigate the term 


t-i 




e=o 


i=i 

C P \ t-i t-i 

1 ~ 'y ^ ^m,jj (Am ~ Mo) 'y ^ '0^(^m) + ^ ^ 

m 7 m 7 M+Ar_ 


t=0 


t=0 




i=i 


^t — TYl — k* ’ 


where pt[9m, 4>m^ Am) = Pt{\/m[6m - 0o], v^[^m - <Ao]> AM[Am - Mo]) and Itj,i is short 
for I^t-j-i>k*+m}(t,j,^)- Here, I a denotes the indicator function of a set A. Letting 
t > m + A:* and s = t — m — k* ^ the drift term can be written as 




= S^Y.^e{9m) 


e=o 


A,0,^ y ] ^7rL,j 

J = 1 


0 , 


(5.3) 


= 


^m 'y^, '4’s-l{9m) I 1 

e=o \ j=l / 

' p \ s—p p—1 / £ N 

St ( 1 - <Am j j ^ tplidm) + ^ 'lps-e{dm) f 1 “ X! 

j=i / e=o e=o \ j=i / 


s < 0, 

0 < s <p, 

s>p. 


Note that the drift has been rescaled, so that s < 0 indicates that the change has not yet 
occurred. The further distinction into the cases 0 < s <p and s>p takes into account 
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the autoregressive order. It follows that 

= + (5.4) 

with At from (5.1). To prove the theorems of Section 3.2, it remains to analyze partial 
sums of the it — St and compare them to the growth of the threshold gj{m, k). 


Proof of Theorem 3.3. Let k> k* and M = k — k*. Utilizing At from (5.1) and display 
(5.4), it follows that 

m+fc m+fc m+fc M 

(£*-£*)= X! (^t+) = X! 

The first term on the right-hand side can be treated as under the null hypothesis, see 
Dienes and Aue [14]. The drift of the cumulative sum procedure can be determined as 
follows. First, for M <p, (5.3) implies directly that 

M Ms / t 

^s-i{0n.) 1 - E 

s—0 s—0 £—0 \ j=l 


Second, for M >p, another application of (5.3) using the cases for 0< s <p and p< s < 
M to split up the sum and subsequently combining the terms involving the incomplete 
sums 1 — ^rn.,j of estimated autoregressive coefficients, yields 


M 




s=0 


M—p 


'^tpl{em)[{M-p+l)-l\ 


i=i 


^=0 


p—1 / ^ \ M—s 

( 1 ~ 4>m,j I i’iidm) 

s=0 \ j=l ) t=Q 


= (M - p + 1) + [Ai (M) - (A/) + Ag (M)], 


where /Omil) with ^miX) = 1 - ^m,iz - $m,pZ^ and 0^(1) = 1 + 

dm,iz H-h 9m,qz‘^, and 


AXM) = {M-p+l)il-J2^mj 

\ i=i / 


M—P+1 


M—p 




i=i 




p—l / ^ \ 

AXM) =Y(^-Y E 

s=0 \ j=l ) l=Q 
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It is clear that will be close to its deterministic equivalent if m is large. The 
terms Ai{M), A 2 {M) and A^^M) are stochastically bounded, so that Proposition 5.1 
implies that, as m —>■ oo, 



7 - 1/2 


max 

k*<k<N 


5Mk-k*) 

gj{m,k) 


op(l), 


i = l, 2 , 3 . 


If the sequence N = N{m,x) given in Fremdt [16] is used as the upper bound for the 
maximum. The rest of the proof of part (a) of the theorem follows now analogously to 
the proof of Theorem 2.2 in Fremdt [16]. 

Part (b) can be verified by an extension of the proof in Aue and Horvath [2] , relaxing 
their assumption on the order of the change-point to the requirement of part (a) in 
Assumption 3.1. This can be done with the sharper estimates developed in Fremdt [16]. 
Further details are omitted to conserve space. □ 


Proof of Theorem 3.4. To investigate the behavior of the general detectors under 
the previous proof needs to be adjusted for the squared residuals. From (5.4) it follows 
that, for t>m + k*, 

e? - £? = At + 2At£t -I- (A(‘_^_j,.)^ -I- + At). (5.5) 

The first two terms on the right-hand side can again be treated as under the null hypothe¬ 
sis. The relevant drift term for the sequential procedures consists then of the partial sums 
of (A(‘_^_t..)^ and 2A(‘_^_^.(ert + At), of which the latter will be negligible. To verify 
this claim, observe first that 


max 

k* </c<oo k 


-Y.v_,\=op(i) 


i =0 


since, on account of the strong law of large numbers, ^ 1^*1 converges almost surely 

as fc —>• oo. Because 


N 

max 
k*<k<N V m 


7 - 1/2 


gpf{m,k) 


= o(l) (m—>-00), 


it follows from Proposition 5.1 that 


m J 


m+k 


max / / , X 

k-<k<N ^' qp^(m,k) 
- t=m+k- ’ 


= Op(l) (m—>-00). 


Utilizing the definition of A^ in (5.3) and another application of Proposition 5.1 in 
combination with Lemmas 6.1-6.3 of Dienes and Aue [14] yield also that 


7 — 1/2 m+k am \ 

---^=op(l) (m—T^oo). 

qp,(m,k) 

- t=m+k* ’ ' 
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It therefore remains to extract the dominating term from the partial sums of )^. 

To facilitate notation, the abbreviations ipg. = '4’ii^m)^ (z) = l — 0m,— ... — 

£ = l,...,p—1, and k' = k* + m+ p are used. Then, 


m-\-k 

t—k' 


m-\-k /t—k' \ ^ m+k /p—1 

0m(l) XI ( ) + X ( 

t^k' \£^0 / t^k' 

m+fc /t — k' \ / p—1 \ " 

- 20m(l) X f X f X 0m I 


t=fc' \£=0 


’'=0 


Similar arguments to those used in the proof of Theorem 3.3 yield that only the first 
term needs to be investigated. Since 


X X=(*+1) X'/^M "2 X"^^ X X '/’M+X X 


s=o \e=o 


\e=o 


\e=o / s=o \^=s+i 

t 


s=0 \f=s+l 




= (* + l)(^X^^j “2(XV’^ 

t / OO \ ■ 

+X X 


X^^s+ (< + !) X 


S = 1 


s—t-\-l 


S=0 \^=S + 1 


following the arguments of the proof of Theorem 3.3 implies that (t +1)0“^(1) is the dom¬ 
inating term in this expression. The rest follows analogously to the proof of Theorem 2.2 
in Fremdt [16]. □ 


5.3. Proofs of the results in Section 3.3 

Denote by (z* :t € Z) the sequence of independent, identically distributed and standard¬ 
ized random variables given by the requirement e* = atZt for all t G Z. Therefore changes 
in scale do not affect the Zt’s. In the following, the subscript 0 in the quantities yog, Xgg 
and eo,t will indicate that the corresponding random variables are generated according 
to the null parameter vector ^q. 

Precluding a break in the mean, autoregressive and moving average parameters and 
only allowing breaks in the scale parameter, leads to the decomposition 

~ ^o.t =+ 2Ateo,t + -I-2A^_^_;.. (eo,t + At), (5.6) 

for t > m + k*, which is analogous to (5.5). Now can be further decomposed 

into 


A 


t — m — k* 


+ Bt). 
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Using that, for t>m + k*, 

Xt-Xo^t=S^l TTk{(f>A) 

\ k=0 

and setting again s = t — m — k* gives 


/t—m—k 


s-q 


^t — k~\~ ^ ^ ^0,j ^t—j — k 


q—1 k 


^ ^m,j) ^~^^ '4^k^t—j — k ^ ^ *05 — ^m,j)^m-\-k*—j-\-k 

j—1 k—0 k—lj—1 

p s-j s-j-k 

^ ^m,j)'^^'4^k ^ ^ '^n{^Q)^t—j — k—n 

j—1 k—0 n—0 

p q s-js-j-k-q 

^ ^m,j) ^0,j ^ ^ ^ ^ (0o)^^—J — 

j=l £^1 k^O n^O 

P s—j q—1 n 

^ ^ {4^0,j ^mj') ^ ^ '4^k ^ ^ j —fc—n (0o) ^ ^ ^0,l^m-\-k* -\-n—l 

j=l /c=0 n=l i^l 


= ^l,t + --- + ^, 


5,i ? 


where ijik ='^k(4^o) =0 for k <0. The following lemma identifies the dominating term in 
the partial sums of 


Lemma 5.1. Under the assumptions of Theorem 3.5, 


N 


7 - 1/2 


max 


m J k*<k<N gj{m, k) 


m-\-k 


m-\-k 


Y Y 


t—m-\-k* 


t—m-\-k* 


= Op(l) 


as m —>■ c». 


Proof. It suffices to examine the quantities on the right-hand side of (5.6). Notice first 
that A( -I- 2Ateo,t contains only terms related to the behavior under the null hypothesis. 
For the next two terms on the right-hand side of (5.6), write 


(A^) -I-2A7eo,t + At) 

= -|- 2AfA7 


(5.7) 


The first term is the dominating term. Since = (J))^)^ -|- 2cro^mi f^e assertion of the 
lemma will follow if the remaining terms can be shown to be negligible. For the second 
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term notice that 


ZtBt ^ (Ny-^/^ \zt\\Bt\ 

— max > — - -— < — > — - -— 

m) k*<k<N ^' qAm.k) \mj ^—' qAm.k*] 

^ - t=m+k‘ ^ J \ / t=m+fc* ^ 


op(l), 


since zt and Bt are independent and ymE[Bt] < oo, following the arguments used in 
Dienes and Aue [14]. For the third term in (5.6), observe that 

max y ^ y bi^ 


The proof is only detailed for £ = 1, since all other terms can be handled in a similar 
fashion. For this case, 


7 - 1/2 


m-\-N / q 


N 


ml q^im.k*) 

/ 1 t=m+k* \j=l 


s-q 


♦ I ^ j ) i’kZt-j-k 


k=0 


N\ 


7 - 1/2 


m-\-N q 


<1 — 1 

m 


) q^(m,k*) 

/ a7V J 1 


^kZt-j-k 


Kk=Q 


Now the arguments of Lemma 5.2 in Dienes and Aue [14] apply and yield the op(l) rate. 
For the last term in (5.6) there is nothing to show, since 


Ag At — SyXtZt + XtBt) < SyXtZt + At + B^) 


and all these terms have already been shown to be negligible. The proof is complete. □ 

Proof of Theorem 3.5. The relevant drift term has been identihed in Lemma 5.1. 
Noticing that the law of the iterated logarithm implies that, for all i5 G (0,1/2) and as 
m —?► oo. 



7-1/2 ^ 

max — - --- 

k-<k<N g^{m, k) 


m-\-k 

Y. (-<i) 


= Op(l) 


7 / 0 I /2 7+^ 

TTTl^ - max 7-;-rr;- 

_/yi/ 2-7 i<k<N-k* (m + k* + m)i“T' 


Op(l)> 


the assertion of the theorem follows. 


□ 
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